Goto

Collaborating Authors

 optimality principle


Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements

Charaja, Jhon, Wochner, Isabell, Schumacher, Pierre, Ilg, Winfried, Giese, Martin, Maufroy, Christophe, Bulling, Andreas, Schmitt, Syn, Haeufle, Daniel F. B.

arXiv.org Artificial Intelligence

The mimicking of human-like arm movement characteristics involves the consideration of three factors during control policy synthesis: (a) chosen task requirements, (b) inclusion of noise during movement execution and (c) chosen optimality principles. Previous studies showed that when considering these factors (a-c) individually, it is possible to synthesize arm movements that either kinematically match the experimental data or reproduce the stereotypical triphasic muscle activation pattern. However, to date no quantitative comparison has been made on how realistic the arm movement generated by each factor is; as well as whether a partial or total combination of all factors results in arm movements with human-like kinematic characteristics and a triphasic muscle pattern. To investigate this, we used reinforcement learning to learn a control policy for a musculoskeletal arm model, aiming to discern which combination of factors (a-c) results in realistic arm movements according to four frequently reported stereotypical characteristics. Our findings indicate that incorporating velocity and acceleration requirements into the reaching task, employing reward terms that encourage minimization of mechanical work, hand jerk, and control effort, along with the inclusion of noise during movement, leads to the emergence of realistic human arm movements in reinforcement learning. We expect that the gained insights will help in the future to better predict desired arm movements and corrective forces in wearable assistive devices.


Optimality Principles in Spacecraft Neural Guidance and Control

Izzo, Dario, Blazquez, Emmanuel, Ferede, Robin, Origer, Sebastien, De Wagter, Christophe, de Croon, Guido C. H. E.

arXiv.org Artificial Intelligence

Spacecraft and drones aimed at exploring our solar system are designed to operate in conditions where the smart use of onboard resources is vital to the success or failure of the mission. Sensorimotor actions are thus often derived from high-level, quantifiable, optimality principles assigned to each task, utilizing consolidated tools in optimal control theory. The planned actions are derived on the ground and transferred onboard where controllers have the task of tracking the uploaded guidance profile. Here we argue that end-to-end neural guidance and control architectures (here called G&CNets) allow transferring onboard the burden of acting upon these optimality principles. In this way, the sensor information is transformed in real time into optimal plans thus increasing the mission autonomy and robustness. We discuss the main results obtained in training such neural architectures in simulation for interplanetary transfers, landings and close proximity operations, highlighting the successful learning of optimality principles by the neural model. We then suggest drone racing as an ideal gym environment to test these architectures on real robotic platforms, thus increasing confidence in their utilization on future space exploration missions. Drone racing shares with spacecraft missions both limited onboard computational capabilities and similar control structures induced from the optimality principle sought, but it also entails different levels of uncertainties and unmodelled effects. Furthermore, the success of G&CNets on extremely resource-restricted drones illustrates their potential to bring real-time optimal control within reach of a wider variety of robotic systems, both in space and on Earth.


An Optimality Principle for Unsupervised Learning

Neural Information Processing Systems

We propose an optimality principle for training an unsu(cid:173) pervised feedforward neural network based upon maximal ability to reconstruct the input data from the network out(cid:173) puts. We describe an algorithm which can be used to train either linear or nonlinear networks with certain types of nonlinearity. Examples of applications to the problems of image coding, feature detection, and analysis of random(cid:173) dot stereograms are presented.


On Bellman's Optimality Principle for zs-POSGs

Buffet, Olivier, Dibangoye, Jilles, Delage, Aurélien, Saffidine, Abdallah, Thomas, Vincent

arXiv.org Artificial Intelligence

Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i.e., exploiting the fact that sub-problems are nested recursively within the original problem. Here we show how it can apply to (infinite horizon) 2-player zero-sum partially observable stochastic games (zs-POSGs) by (i) taking a central planner's viewpoint, which can only reason on a sufficient statistic called occupancy state, and (ii) turning such problems into zero-sum occupancy Markov games (zs-OMGs). Then, exploiting the Lipschitz-continuity of the value function in occupancy space, one can derive a version of the HSVI algorithm (Heuristic Search Value Iteration) that provably finds an $\epsilon$-Nash equilibrium in finite time.


Game-theoretic applications of a relational risk model

Urazaeva, Tatiana

arXiv.org Artificial Intelligence

The report suggests the concept of risk, outlining two mathematical structures necessary for risk genesis: the set of outcomes and, in a general case, partial order of preference on it. It is shown that this minimum partial order should constitute the structure of a semilattice. In some cases, there should be a system of semilattices nested in a certain way. On this basis, the classification of risk theory tasks is given in the context of specialization of mathematical knowledge. In other words, we are talking about the development of a new rela-tional risk theory. The problem of political decision making in game-theoretic formulation in terms of having partial order of preference on the set of outcomes for each par-ticipant of the game forming a certain system of nested semilattices is consid-ered as an example of a relational risk concept implementation. Solutions to the problem obtained through the use of various optimality principles are investi-gated.


Optimal Limited Contingency Planning

Meuleau, Nicolas, Smith, David

arXiv.org Artificial Intelligence

For a given problem, the optimal Markov policy can be considerred as a conditional or contingent plan containing a (potentially large) number of branches. Unfortunately, there are applications where it is desirable to strictly limit the number of decision points and branches in a plan. For example, it may be that plans must later undergo more detailed simulation to verify correctness and safety, or that they must be simple enough to be understood and analyzed by humans. As a result, it may be necessary to limit consideration to plans with only a small number of branches. This raises the question of how one goes about finding optimal plans containing only a limited number of branches. In this paper, we present an any-time algorithm for optimal k-contingency planning (OKP). It is the first optimal algorithm for limited contingency planning that is not an explicit enumeration of possible contingent plans. By modelling the problem as a Partially Observable Markov Decision Process, it implements the Bellman optimality principle and prunes the solution space. We present experimental results of applying this algorithm to some simple test cases.


An Optimality Principle for Unsupervised Learning

Sanger, Terence D.

Neural Information Processing Systems

We propose an optimality principle for training an unsupervised feedforward neural network based upon maximal ability to reconstruct the input data from the network outputs. We describe an algorithm which can be used to train either linear or nonlinear networks with certain types of nonlinearity. Examples of applications to the problems of image coding, feature detection, and analysis of randomdot stereograms are presented.


An Optimality Principle for Unsupervised Learning

Sanger, Terence D.

Neural Information Processing Systems

We propose an optimality principle for training an unsupervised feedforward neural network based upon maximal ability to reconstruct the input data from the network outputs. We describe an algorithm which can be used to train either linear or nonlinear networks with certain types of nonlinearity. Examples of applications to the problems of image coding, feature detection, and analysis of randomdot stereograms are presented.


An Optimality Principle for Unsupervised Learning

Sanger, Terence D.

Neural Information Processing Systems

We propose an optimality principle for training an unsupervised feedforwardneural network based upon maximal ability to reconstruct the input data from the network outputs. Wedescribe an algorithm which can be used to train either linear or nonlinear networks with certain types of nonlinearity. Examples of applications to the problems of image coding, feature detection, and analysis of randomdot stereogramsare presented.